A modular RNN-based method for continuous Mandarin speech recognition

نویسندگان

  • Yuan-Fu Liao
  • Sin-Horng Chen
چکیده

A new modular recurrent neural network (MRNN)-based method for continuous Mandarin speech recognition (CMSR) is proposed. The MRNN recognizer is composed of four main modules. The first is a sub-MRNN module whose function is to generate discriminant functions for all 412 base-syllables. It accomplishes the task by using four recurrent neural network (RNN) submodules. The second is an RNN module which is designed to detect syllable boundaries for providing timing cues in order to help solve the time-alignment problem. The third is also an RNN module whose function is to generate discriminant functions for 143 intersyllable diphone-like units to compensate the intersyllable coarticulation effect. The fourth is a dynamic programming (DP)-based recognition search module. Its function is to integrate the other three modules and solve the time-alignment problem for generating the recognized base-syllable sequence. A new multilevel pruning scheme designed to speed up the recognition process is also proposed. The whole MRNN can be trained by a sophisticated three-stage minimum classification error/generalized probabilistic descent (MCE/GPD) algorithm. Experimental results showed that the proposed method performed better than the maximum likelihood (ML)-trained hidden Markov model (HMM) method and is comparable to the MCE/GPD-trained HMM method. The multilevel pruning scheme was also found to be very efficient.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An RNN-based preclassification method for fast continuous Mandarin speech recognition

A novel recurrent neural network-based (RNN-based) frontend preclassification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states o...

متن کامل

An RNN-Based Pre-classi cation Method for Fast Continuous Mandarin Speech Recognition

A novel RNN-based front-end pre-classiication scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, nal, and silence. A nite state machine (FSM) is then used to classify the input frame into four states including three stable states of Initial (I), Final (F), and Silenc...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

An MRNN-based method for continuous Mandarin speech recognition

A new MRNN-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discriminations of the two sub-syllable groups of 100 RFD initials and 39 CI finals, two RNNs for the generations of dynamic weighting functions for sub-syllable’s int...

متن کامل

Prosodic modeling of Mandarin speech and its application to lexical decoding

In this paper, a new RNN-based prosodic modeling method for Mandarin speech recognition is proposed. It is performed in the post-processing stage of the acoustic decoding aiming at detecting word boundaries for assisting in the lexical decoding. It employs a simple RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries provided...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2001